This section show some important numbers representative of the data set such as the number of unique strains with or without ccyA or
the number of assembly with or without ccyA present in both RefSeq and GenBank.
The ccyA~ label indicate that one assembly is found in both GenBanK and RefSeq databases
but the GenBank version and the RefSeq version do not share the same genotype regarding the ccyA gene
(i.e one is ccyA+ , the other is ccyA-).
RedLevel stand for redundancy level. GenBanK and RefSeq assembly have a unique identifier but the same assembly can be found in GenBanK
and/or RefSeq. One assembly can have mulitple version (including minor change in their annotation for example). Here, several levels were defined
to overcome the redundancy implied by assembly versionning and the database duo GenBank-RefSeq. There are 4 levels of redundancy :
Organism : we considere only one assembly at the strain level. (e.g if there are 13 assemblies ccyA+ for Microcystis aeruginosa then only one will be considered).
UID : we considere only one assembly that could be present under GenBanK and/or RefSeq in different version.
UIDV : we considere only one assembly version that could be present under GenBanK and/or RefSeq.
Accession : We considere all version of all assembly present in both GenBank and RefSeq
Graphical overview.
The first chart of this section show the increasing number of genome and calcyanin over the time.
The second one is called a sunburst plot and show the number of sequence by categories in a hierarchical way, starting from the N-ter type.
If you click on a specific area you will see the number of sequences for each sub-catergories.
NOTE : The lineplot below show the total number of calcyanin sequences found in all assembly. Therefore there might be duplicated sequences due to GenBank/RefSeq versionning and assembly versionning
Sunburst
Treemap
Calcyanin classification.
The decision tree below is used to classify sequences with a significative match against the GlyX3 HMM profile.
Red and green edges indicate respectively negative and positive answers.
Shortly, for sequences with a match against the GlyX3 HMM profile, we look at the presence and order (on the sequence)
of each Glycine Zipper and we use a set of known N-ter to infer the nature of the N-ter extremity of those
sequences. Finaly a label is assign for each of them depending on their modular organization.
Modular Organization.
This section is dedicated to the modular organization of the calcyanin.
Sequences are grouped based on their N-ter type whatever their flag (see Calcyanin classification section).
It makes it possible to visualize the size of the sequences and the position of the different domains.
CoBaHMA-type
X-type
Z-type
Y-type
Unknown-type
No data for this kind of N-ter
No data for this kind of N-ter
No data for this kind of N-ter
Browse datas
This section allow you to browse specific informations about one assembly and/or one protein.
Input fields above the list can be used to filter entries based on major attributes.
Clicking on an entry will give you access to the protein sequence(s) attached to it (if any). Related
informations about the assembly and/or the sequence will be shown at the end of the section.
Additionnaly, you can use the green icon on the right of a ccyA+ entry to add to cart.
You can filter datas based on Organism name, accession , sequence accession, flag or N-ter type. For that you should use one or multiple search bars below. The keyword order doesn't matter.